This is a simple example of Reinforcement Learning using Python and the OpenAI Gym library.
Reinforcement Learning is a type of machine learning where an agent learns to make decisions by interacting with an environment. The agent takes actions in the environment, receives feedback in the form of rewards or penalties, and uses this feedback to improve its decision-making over time. The goal is for the agent to learn a policy that maximizes the cumulative reward over time.
Key concepts of Reinforcement Learning:
Reinforcement Learning is applied in various domains, including robotics, game playing, and autonomous systems.
Python Source Code:
# Import necessary libraries
import gym
import numpy as np
# Create the FrozenLake environment
env = gym.make('FrozenLake-v1')
# Initialize the Q-table with zeros
Q = np.zeros([env.observation_space.n, env.action_space.n])
# Set hyperparameters
learning_rate = 0.8
discount_factor = 0.95
num_episodes = 2000
# Implement the Q-learning algorithm
for episode in range(num_episodes):
state = env.reset()
total_reward = 0
done = False
while not done:
# Choose an action using epsilon-greedy strategy
if np.random.rand() < 0.5:
action = env.action_space.sample() # Exploration
else:
action = np.argmax(Q[state, :]) # Exploitation
# Take the chosen action and observe the next state and reward
next_state, reward, done, _ = env.step(action)
# Update the Q-value using the Q-learning formula
Q[state, action] = (1 - learning_rate) * Q[state, action] + \
learning_rate * (reward + discount_factor * np.max(Q[next_state, :]))
# Move to the next state
state = next_state
# Accumulate the total reward for this episode
total_reward += reward
# Print the total reward for each episode
if (episode + 1) % 100 == 0:
print(f"Episode {episode + 1}/{num_episodes}, Total Reward: {total_reward}")
# Evaluate the learned policy
total_reward = 0
num_eval_episodes = 10
for _ in range(num_eval_episodes):
state = env.reset()
done = False
while not done:
action = np.argmax(Q[state, :])
next_state, reward, done, _ = env.step(action)
total_reward += reward
state = next_state
# Print the average reward over evaluation episodes
average_reward = total_reward / num_eval_episodes
print(f"Average Reward over {num_eval_episodes} Evaluation Episodes: {average_reward}")
Explanation: